attention mechanisms

Terms from Artificial Intelligence: humans at the heart of algorithms

Attention mechanisms are used in machine learning, particularly for text and in large language models and other forms of deep learning. The concept borrows form human attention, which is critical for cognition. Given a sequence of tokens (e.g. words in a text), simple window-based methods would treat all past tokens in the window as equally relevant for predicting or otherwise leaening based on a current token. In contrast attention mechanisms attempt to work out past tokens that are especially relevant and gves these higher weight (attention) during training. This may be acheived by creating an interest vector for each input (as part of the deep learning process) that in some way represents its topic area and then matching the interest vector of past tokens with the current token. Some attention mechanisms will also retain past tokens that have proved especially relevant based on their relevance to later tokens, thus allowing memory beyond the window. Salience can be combined with attention mechanisms by rating the unusuallness of tokens and token sub-seqences and use these to weight tokens together with similarity to the current token.

Used on pages 328, 572